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ABSTRACT 

The  study  of  the  Bayesian  learning  dynamics  with  optimizing  agents  is  facilitated  by  embedding 
the  individual  decision  problems  within  a  standard  dynamic  programming  environment.  This  may  be 
accomplished  by  augmenting  the  state  space  to  include  the  set  of  possible  beliefs  over  a  parameter 
space  0  representing  the  set  of  a  priori  possible  model  specifications.  But  to  apply  the  standard 
dynamic  programming  results  on  the  existence  of  an  optimal  policy,  it  is  necessary  to  establish  that 
the  distribution  of  the  posterior  beliefs  is  a  continuous  function  (in  the  topology  of  weak  convergence 
of  probability  measures)  of  the  current  state  variable  and  chosen  action.  We  develop  necessary  and 
sufficient  conditions  for  the  continuity  of  the  posterior  distribution  map. 


1.  INTRODUCTION 

The  study  of  the  Bayesian  learning  dynamics  with  optimizing  agents  is  facilitated  by  embedding 
the  individual  decision  problems  in  a  standard  dynamic  programming  environment.  As  first 
demonstrated  by  Hinderer  (1970)  and  Reider  (1975),  this  may  accomplished  by  augmenting  the  state 
space  to  include  the  set  of  possible  beliefs  over  a  parameter  space  0  representing  the  set  of  a  priori 
possible  specifications.  But  to  apply  the  standard  dynamic  programming  results  on  the  existence  of 
an  optimal  policy  as  in  Blackwell  (1965)  or  Maitra  (1968)  it  is  necessary  to  establish  that  the 
distribution  of  the  next  period  state  variable  is  a  continuous  function  (in  the  topology  of  weak 
convergence  of  probability  measures)  of  the  current  state  variable  and  chosen  action.  In  the  context 
of  a  Bayesian  dynamic  programming  problem  this  requires  that  the  Bayesian's  probability 
distribution  over  next  period  beliefs  vary  continuously  with  her  current  beliefs  and  chosen  action. 
The  object  of  this  paper  is  to  find  a  minimal  set  of  sufficient  conditions  which  yield  the  requisite 
continuity. 

There  has  been  considerable  recent  literature  in  which  agent(s)  are  modelled  as  solving  Bayesian 
dynamic  programming  problems,  including  work  by  Aghion,  Bolton,  Harris  and  Julien  (1991), 
Easley  and  Kiefer  (1988),  Easley  and  Kiefer  (1989),  Feldman  and  McLennan  (1989),  Feldman  and 
Spagat  (1991),  Kiefer  and  Nyarko  (1988),  Kiefer  and  Nyarko  (1989),  McLennan  (1984),  and 
Nyarko  (1991).  Invariably  there  is  a  parameter  space  0,  an  action  (or  action/state)  space  X,  and  an 
outcome  space  Y.  Let  P(Y),  P(0)  and  /?2(0)  denote  respectively  the  space  of  probability  measures 
on  Y,  0  and  P(Q).  Suppose  that  for  parameter  0  €  0  and  action  x  e  X,  the  distribution  of 
outcomes  is  4^0,  x)  e  P(Y)  and  that  the  distribution  of  posterior  beliefs  for  prior  jj.  e  P(Q)  is 
(p(|i,  x)  g  P2(Q).  To  attain  the  requisite  continuity  of  the  map  9,  a  frequent  assumption  is  that  with 
respect  to  some  reference  measure  v  on  Y,  4^0,  x)  has  a  Radon-Nikodym  derivative  f(0,  x,  •)  with 
f  jointly  continuous  in  9,  x,  and  y. 

Unfortunately,  requiring  that  the  map  4*  has  a  representation  by  a  jointiy  continuous  density  f  is 
highly  restrictive.  The  objective  of  this  paper  is  to  determine  minimal  conditions  on  *F  which  are 
sufficient  for  the  continuity  of  (p.  Theorem  3.2,  the  main  technical  result  of  this  paper,  establishes 
that  if  0,  X  and  Y  are  separable  metric  spaces,  then  9  is  continuous  if  4*  is  continuous  when  P(0)  is 
endowed  with  the  total  variation  (i.e.,  norm)  topology.  Examples  are  provided  to  demonstrate  that  if 
4*  is  merely  setwise  continuous,  then  <p  may  fail  to  be  continuous.  However,  as  discussed  in  Section 
5,  modulo  some  relabelling,  there  is  no  intrinsic  difficulty  if  some  of  the  components  of  the  outcome 
space  are  deterministic  functions  of  0  and  x. 

The  organization  of  the  paper  is  as  follows.  Definitions  and  notational  conventions  are  provided 
in  Section  2.  The  assumptions  regarding  the  spaces  0,  X,  and  Y,  and  a  statement  of  Theorem  3.2 
are  provided  in  Section  3,  along  with  counterexamples  to  the  conjecture  that  setwise  convergence 
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suffices  for  the  continuity  of  9.  Section  4  contains  needed  lemmas  which  are  proved  in  the 
Appendix,  and  the  proof  of  Theorem  3.2.  In  Section  5  we  conclude  with  some  remarks  on 
application  and  interpretation  of  the  results. 

2.  DEFINITIONS  AND  NOTATION 

The  set  of  real  numbers  is  denoted  by  R.  If  A  c  X,  the  indicator  function  of  A  is  I  a-  Let  (S,  d) 
be  a  metric  space.  The  Borel  a-field  is  denoted  by  B(S),  the  set  of  Borel  probability  measures  is 
P(S),  and  the  set  of  Borel  measures  is  M(S).  For  se  S,  the  Dirac  measure  8S  e  P(S)  is  defined  by 
8S(A)  =  Ia(s)  for  A  e  B(S).  f:  S  — >  R  is  a  Lipschiti  function  if  for  some  k  <  °°,  sup  — ,,    |     <  k. 

The  Lipschitz  seminorm  NAIl  is  defined  by  HAIl  =  sup  — .,    , — .  If  f  is  a  bounded  Lipschitz 

function,  the  bounded  Lipschitz  norm  is  IIiIIbl  =  Hflt  +  HflLwhere  llflL  denotes  the  usual  sup  norm. 
BL(S,  d)  is  the  set  of  all  real-valued,  bounded  Lipschitz  functions  on  (S,  d).  Endowed  with  the 
bounded  Lipschitz  norm,  BL(S,  d)  is  a  Banach  space  (see  e.g.  [Dudley,  1989,  Section  11.2  #54]). 

The  dual  bounded  Lipschitz  metric  on  P(S)y  denoted  by  Ps  (or  by  P  is  there  is  no  ambiguity),  is 
defined  by  Ps(P,  Q)  =  sup  { IJf  dP  -  Jf  dQI:  llfllBL  <  1 },  for  P,  Q  e  P(X).  If  S  is  separable,  ps 

metrizes  the  topology  of  weak  convergence  on  P(X).  Further  details  on  the  properties  of  the  dual 
bounded  Lipschitz  metric  can  be  found  in  Dudley  (1966)  and  Dudley  (1989).  The  total  variation 
norm  on  M(S),   denoted   by   xs  (or  by  1),  is  defined  for  r\ ,  y  e  M(S)  by  xs  (T| ,  y)  = 

sup     lT|(A)  -  y(A)l.  The  restriction  of  x$  to  P(S)  is  also  denoted  by  z$.  The  sequence  {|in} 
converges  setwise  to  ji  e  M(S)  if  JJ.n(A)  ->  |i(A)  for  all  A  e  B(S). 

3.  STATEMENT  OF  THEOREM  AND  INADEQUACY  OF  SETWISE  CONVERGENCE 
3.1.  Statement  of  Theorem 

Consider  a  Bayesian  decision-maker  with  prior  belief  [Iq  on  a  parameter  space  O  who  must 
choose  an  action  x  e  X.  Given  a  parameter  0  e  0  and  her  action  x  e  X,  the  decision-maker  will 
observe  an  outcome  y  e  Y,  the  realization  of  a  random  element  with  (unknown)  law 
4^(0,  x)  e  P(Y).  We  make  the  following  assumptions: 

ASSUMPTION  1 :  (O,  do),  (X,  dx),  and  (Y,  dy)  are  separable  metric  spaces  with  respective  Borel  a- 
fields  £(0),  £(X),  and  B(Y). 

ASSUMPTION  2:  The  mapping  x¥:  QxX-^  (F(Y),  xy)  is  continuous  where  xy  is  the  total  variation 
metric  on  P(Y). 
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Given  a  prior  (J,  and  action  x,  the  induced  probability  measure  on  the  outcome  space  (Y,  B(Y))  is 

(j)(|l,  x)  defined  by  <j)(|i,  x)(A)  =   J 4^(0,  x)(A)  n(d0)  for  A  e  B(Y).  The  corresponding  probability 

0 

on  (0  x  Y,  5(0)  x  B(Y))  is  d>(|i,  x)  defined  by  0(|i,  x)(A  x  B)  =    fa  (8,  x)(B)  |i(d0),  for 

A  e  5(0)  and  B  e  B(Y).  The  posterior  belief  given  prior  (I,  action  x,  and  outcome  y  is  denoted  as 
T([i,  x,  y).  Dynkin  and  Yushkevich  (1979)  establish  the  joint  measurability  of  T  by  extending  a 
standard  proof  of  the  existence  of  a  regular  version  of  conditional  probability.  The  formal  statement 
of  their  result  is: 

LEMMA  3.1.  There  is  a  measurable  function  T:  P(Q)  x  X  x  Y  ->  P(0)  such  that  for  all 

(}l,  x)  g  P(Q)  x  X,  T(ji,  x,  •)  is  a  regular  version  of  <J)(fi,  x)(-  II  B(Y)). 

Proof.  Follows  directly  from  discussion  on  p.  263  of  Dynkin  and  Yushkevich  (1979).  ■ 

For  given  (ji,  x)  e  P(Q)  x  X  and  Borel  set  D  c  P(Q),  the  Bayesian's  prior  probability  of 
{T(ji,  x,  y)  e  D}  is  <p(|i,  x)  where  the  mapping  (p:  P(Q)  x  X  ->  P2(Q)  is  defined  by  (p(]Li,  x)(D)  = 

JId(T(|ll,  x,  y))  <J)(|i,  x)(dy).  The  main  technical  result  of  this  paper  is: 

THEOREM  3.2.  Endowing  P(Q)  and  P2(0)   with   the  topology   of  weak  convergence, 
(p:  P(0)  xX-)  P2(Q)  is  continuous. 
Proof.  See  Section  4. 

3.2.  CounterExamples  To  Theorem  2>2  When  4*  is  Setwise  But  Not  Total-Variation  Continuous 
CounterExample  1:  We  first  provide  an  example  where  *¥  is  only  setwise  continuous  in  x  (and 
jointly  setwise  continuous)  and  (p  is  discontinuous.  Suppose  0  =  {0o,  0i},  X  =  [0,  1],  Y  =  [0,  1) 
and  m  is  Lebesgue  measure  on  (Y,  B(Y)).  Let  8i  denotes  the  Dirac  measure  with  mass  on  0i. 
Suppose  that  *F(0i,  0)  =  m,  and  4^(00,  x)  =  m  for  all  x  e  X,  (So  no  learning  ocurs  if  x  =  0).  Let  g 
be  the  density  function  corresponding  to  m. 

We  now  proceed  to  specify  *F(0i,  x)  for  x  >  0.  For  x  e  (0,  1],  define  k(x)  =  2-[x_1]  where  [c] 
is  the  largest  integer  less  than  or  equal  to  c.  For  j  =  1,2,...,  k(x),  define  the  interval  Ij(x)  =  (\F(j  - 
l,k(x)),  jrtv);  and  for  x  >  0,  define  *F(6i,  x)  as  the  probability  measure  with  density  gx(y)  =  2  for  y 

e  Ij(x)  when  j  is  even  and  gx(y)  =  0  otherwise.  It  is  well-known  that  as  x  — >  0,  gx  converges  weakly 
to  g  in  L^Y,  B(Y),  m).  It  follows  that  ¥(0i,  •)  is  continuous  at  (0i,  0)  when  P(Y)  is  endowed 
with  the  topology  of  setwise  convergence. 
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We  now  demonstrate  that  (p  is  not  continuous.  Choose  |io  such  that  |io({9o})  =  a  e  (0,  1).  Then 
r(|K),  0,  y)  =  |io  for  all  y  e  Y  and  <p(no,  0)  =  5^).  But  for  x  >  0,  T(|io,  x,  y)  =  §i  for  y  e  Ij(x)  with 

j  odd.  So  (p(no,  x)({5o})  =  2  and  as  x  ^  0,  and  cp((Oo,  x)  =£>  cp(po,  0). 

CounterExample  2:  An  example  is  now  provided  where  map  9  —>  ^(G,  x)  is  setwise  continuous  but 
M-n  =>  1^0  does  not  imply  <p(M.n>  x)  =>  <P(M-n,  x).  Let  X  =  {xoh  O  =  [0,  1/2]  u  { 1 }  and  Y  =  [0,  1]. 
Suppose  ¥(0,  x0)  and  ¥(1,  xo)  are  defined  by  4'(0,  x0)  =  8q,  *F(1,  x0)({0})  =  1/2  and  *F(1, 
xo)({  1})  =  1/2.  For  9  e  (0,  1/2],  define  4^(9,  xo)  as  having  uniform  density  on  [0,  8].  So  for  any 
sequence  9n  -»  9o,  ^^n,  xo)  =>  H^n,  xo);  in  particular,  if  9n  ->  0  then  ^^n,  xo)  ->  §o  =  ¥(0, 

xo). 

Now  define  fio  e  P^(0)  by  fio({0})  =  |io({  1 }  =  1/2  and  consider  the  sequence  of  priors  {|in} 
where  |in({  1 })  =  1/2  and  ^({n"1})  =  1/2.  Observe  that  <j)(uo,  x0)({0})  >  0  and  r(\iQ,  x0,  0)({0})  = 
2/3,  and  so  r(|io,  xo,  0)  g  {8o,  5i }.  But  for  n  >  0,  the  outcome  is  completely  informative;  i.e., 
<|>(lin,  xo)  a.s.  r(|in,  xo,  y)  e  {6o,  5i }.  So  {(p(|in,  xo)}  doesn't  converge  to  q>(no,  xo). 

4.  PROOF  OF  THEOREM 
4.1.  Some  Lemmas  for  Probability  Measures  on  Metric  Spaces 

To  prove  the  Theorem,  several  intermediate  results  need  to  be  established.  The  first,  LEMMA 
4.1,  is  an  approximation  theorem  for  probability  measures.  LEMMA  4.2  is  a  sort  of  converse  to 
Scheffe's  Theorem.  LEMMA  4.3,  taken  from  Dudley,  1989  #54],  asserts  that  if  fi  is  a  probability 

measure  on  a  separable  metric  space  S,  then  S  can  be  decomposed  into  a  countable  collection  of 
disjoint  (i-continuity  sets  of  arbitrarily  small  diameter. 

LEMMA  4.1.  Suppose  (S,  d)  is  a  separable  metric  space,  e  >  0,  and  {Ai,  A2,  ...  }  is  a  disjoint 

e 
family  of  Borel  subsets  of  S  with  u  Aj  =  S  and  diam  Aj  <  0  =  t  for  all  j.  Define  Bn  =  Uj^j  Aj. 

Suppose  also  that  for  probability  measures  |i  and  r\  on  (S,  5(S))  there  exists  an  integer  k  such  that 
H(Bk)  >  1  -  8  and  ?upk  |(j.(Aj)  -  T|(Aj)l  <  5k-1.  Then  p(|i,  T})  <  e. 

Proof.  See  Appendix.  ■ 

LEMMA  4.2.  Suppose  (S,  d)  is  a  metric  space  and  {vj}  and  {rjj}  are  sequences  of  finite  Borel 

measures  on  (S,  B(S))  such  that:  (i)  Vj  -»  vo  in  total  variation,  (ii)  T]j  ->  T|o  in  total  variation,  and  (iii) 

dn  ■ 
Tjj  «  Vj  for  j  =  0,  1,  ...  .  Suppose  also  that  3  a  sequence  {fi}  of  versions  of  — l  which  are 

dvj 

uniformly  bounded  in  L°°(S,  £(S),  vo)  and  let  f  be  a  version  of—.  Then  fj  ->  f  in  L*(S,  5(S),  vo) 

dv0 

and  in  vo  measure. 
Proof  See  Appendix.  ■ 


LEMMA  4.3.  For  any  separable  metric  space  (S,  d),  e  >  0,  and  P  e  P(S),  there  exists  a  sequence 
{ Aj }  of  disjoint  P-continuity  sets  with  u  Aj  =  S  and  diam  Aj  <  e  for  all  j. 
Proof.  This  is  Lemma  11.7.3  of  Dudley  (1989).  ■ 

To  motivate  LEMMA  4.4  suppose  that  S  is  a  metric  space,  A  e  B(S)  and  |in  =>  \M)  e  ^(S)  with 
|io(A)  >  0.  We  can  define  a  sequence  of  conditional  measures  i"|k  on  (S,  B(S))  for  k  =  0,  1,2,  ...,  by 

"HkCQ  = •  In  general,  r\K  doesn't  necessarily  converge,  and  if  it  does  converge,  the  limit 

Wc(A) 

need  not  be  r|o-  But  if  A  is  a  jio  contiinuity  set,  then  %  =$  T|n. 

LEMMA  4.4.  Suppose  S  is  a  metric  space,  |in  =>  po  e  ^(S),  and  A  is  a  |io-continuity  set  with  \1q(A) 

>  0.  For  k  =  0,  1,  ...,  define  the  probability  measure  %  on  (S,  B(S))  by  T)k(C)  = .  Then 

Wc(A) 

Proof  Let  F  c  S  be  closed;  it  suffices  to  show  that  limsup  T|k(F)  <  T|o(F).  Since  jik  =>  M-0  and 

F  n  cl  A  is  closed,  limsup  fik(F  n  cl  A)  <  (i(F  n  cl  A)  =  |i(F  n  A),  where  the  equality  follows 

from  A  being  a  jio-continuity  set  Since  |ik(A)  ->  }i(A)  >  0,  we  have: 

r  «^^r  r  ^k(F  nclA)w  p(F  n  cl  A)     [i(Fr>A) 

limsup  Tik(F)  <  hmsup  F — ]  < = =  Tio(F).  ■ 

Wc(A)  H(A)  Ji(A) 

Lemmas  4.5  and  4.6  are  respectively  variants  of  Scheffe's  Theorem  [e.g.,  [Billingsley  (1986), 
Theorem  16.  11]]  and  a  generalized  Dominated  Convergence  Theorem  in  Royden  (1988,  Proposition 
11.18).  Since  I  was  unable  to  locate  a  reference  containing  needed  versions  of  these  theorems,  for 
completeness  proofs  are  provided. 

LEMMA  4.5.  Let  (S,  F,  v)  be  a  measure  space.  Suppose:  (i)  for  all  A  e  F,  Xn(A)  =   ffn(s)  v(ds)  and 
\(A)  =  Jf(s)  v(ds)  for  densities  fn  and  f,  and  (ii)  for  n  =  1,2,  ...,  Xn(S)  =  ^(S)  <  <».  Then  fn  — >  f  in 

v  measure  is  a  necessary  and  sufficient  condition  for  is(A,n,  X)  ->  0. 
Proof  See  Appendix.  ■ 

LEMMA  4.6.   Let  (S,  F)  be  a  measurable  space  and  {vn}  a  sequence  of  measures  that  converge 
setwise  to  a  finite  measure  v.  Suppose  {fn}  is  a  sequence  of  uniformly  bounded,  real-valued 

functions  on  S  with  fn  ->  f  in  v  measure.  Then:  Jfn(s)  vn(ds)  -»  jf(s)  v(ds). 

Proof  See  Appendix.  ■ 
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4.2.  Proof  of  Theorem  and  Auxiliary  Results 

PROPOSITION  4.7:  If  A  is  a  |io-continuity  set,  then  the  map  <j>a:  P(&)  x  X  ->  (M(Y),  xY)  defined  by 

<j)A(j!,  x)(B)  =  <J>(|1,  x)(A  x  B)  =    [^(e,  x)(B)  |i(d0)  is  continuous  at  (no,  x)  for  all  x  e  X. 
Proof  See  Appendix.  ■ 

COROLLARY  4.8:  Endowing  P(Y)  with  the  total  variation  topology,  the  map  (j):  P(Q)  xX^  P(Y)  is 

continuous. 

Proof  For  all  |i  e  P(0),  0  is  a  |i-continuity  set  and  so  §o  is  total  variation  continuous  at  all 

(ji,  x)  e  P(0)  x  X.  Since  (j)0()J.,  x)  =  <})(!-L  x),  the  result  follows.  ■ 

Proof  of  THEOREM  3.2.  Let  g:  P(0)  — >  R  be  a  bounded,  continuous  function.  It  is  necessary  to 
verify  that  for  any  sequence  {(|in,  xn)}  with  (in  =>  [Iq  and  xn  — »  xq,  that  Jg((i)  (p(M-n>  xn)(dfi)  — > 

Jg(fi)  (p(ji0,  xo)(dji).  Since   Jg(ji)  (p(|in,  xn)(d|i)  =  Jg(r(^n,  xn,  y))  <t>(^n,  xn)(dy)  and 

Jg(|i)  9(^0,  xo)(d^l)  =  Jg(r(|io,  xo,  y))  <|)(M.o,  xo)(dy),  it  suffices  to  demonstrate  that  for 

(fin,  xn)  ->  (|I0,  xo),|g(r(fln,  xn,  y))  <j)(|in,  xn)(dy)  ->  Jg(r(|io,  xo,  y))  ({)(M-0>  xo)(dy). 

£ 

Suppose  (jin,  xn)  — >  (jJ-o,  xo)  and  pick  e  >  0.  Defining  5  =  t,  by  Lemma  4.3  there  exists  a 

disjoint  cover  Ai,  A2,  ...  of  0,  with  diam  Aj  <  5,  consisting  of  |io-continuity  sets.  Define  T(j  n  and 

vn  on  (Y,  B(Y))  by  Tij,n(C)  =  <D(^n,  xn)(Aj  x  C)  =  (j)Aj(^  x„)(C)  and  v„(C)  =  0(^n,  x„)(0  x  C)  = 

(KM-rii  xn)(C).  By  Proposition  4.7  and  Corollary  4.8,  rijn  — >  T|j?o  and  vn  — >  vo  in  total  variation. 

dn  ■ 
Define  fj,n  as  the  version  of  — "-  such  that  fj  n(y)  =  r(fin,  xn,  y)(Ai)  for  all  ye  Y.   Define 

dvJ>n 

Bm  =  UjH1!  Aj,  and  choose  k  such  that:  <J)(|io,  xo)  ({y:  r(|io,  xo,  y)(Bk)  >  1  -  5})  >  1  -  5/2. 

Defining  D  =  {y:  T([iQ,  xo,  y)(Bk)  >  1  -  5},  <|>(Ho.  xo)(D)  >  1  -  5/2. 

We  now  defme  a  set  EncY  such  that  (3(r(fin,  xn,  y),  Hjio,  xo,  y))  <  e  for  y  e  Dn  En.  Let  En 

=  (y:  sup    ir(^0,  xo,  y)(Aj)  -  r(|in,  xn,  y)(Aj)l  <  5-k-1}.  If  y  e  D  n  En,  then  p(r(|in,  xn,  y), 

l£j<k 

r(|io,  xo,  y))  <  e  by  Lemma  4.1.  By  Lemma  4.2,  fj,n  ->  fj,o  in  vo  measure  for  all  j  (and  in  particular 
for  j  <  k).  So  for  n  sufficiently  large,  vo(En)  >  1  -  5/2,  vo(D  n  En)  >  1  -  5,  and  vo({y:  P(r(jin>  xn, 
y),  r(no,  xo,  y))  <  e})  >  1  -  e. 

Define  the  functions  ho:  Y  -»  P(0)  and  hn:  Y  -»  P(Q)  by  hn(y)  =  T(fi.o,  xo,  y)  and  hn(y)  = 
HM-n»  xn,  y).  From  the  above  paragraph  and  Theorem  9.2.2  of  Dudley  (1989),  hn  ->  ho  in  (J)(|io,  xo) 
probability  (when  P(Q)  is  endowed  with  the  topology  of  weak  convergence).  Since  g  is  continuous, 


g(hn)  ->  g(ho)  in  (J)((io,  xp)  probability.  By  Corollary  4.8,  (j)(u.n,  xn)  ->  (j)(|ip,  xo)  in  total  variation 
and  hence  setwise.  So  by  Lemma  4.6: 

Jg(hn(y))  <K|in,  xn)(dy)  ->  Jg(ho(y))  <|>(H0,  xo)(dy), 

or 

fg(r(|in,  xn,  y))  <|>(Hn,  xn)(dy)  -»    Jg(r(|ip,  x0,  y))  <|>(HP,  XQ)(dy).  ■ 

5.  SOME  REMARKS  ON  APPLICATIONS 

In  many  dynamic  programming  problems  it  is  natural  for  some  components  of  the  state  space  be 
deterministic  functions  of  previous  actions  and  state  variables.  But  in  such  settings,  if  the  set  Y  in 
Assumptions  1  and  2  is  identified  with  the  state  space,  Assumption  2  will  not  be  satisfied  and 
Theorem  3.2  will  not  be  applicable.  Fortunately,  such  difficulties  are  easily  surmounted  in  a  manner 
which  will  be  briefly  sketched. 

Consider  a  Bayesian  dynamic  programming  problem  with  parameter  space  O,  state  space  S, 
action  space  A  and  transition  map  ^:  O  x  S  x  A  — >  P(S).  The  generalized  or  Bayesian  state  space  is 
X  =  P(Q)  x  S.  With  this  generalized  state  space,  the  Bayesian  optimization  problem  is  a  conventional 
dynamic  programming  problem.  Let  £:  £  x  A  — >  P(L)  be  the  induced  generalized  transition  map.  To 
apply  standard  results  on  the  existence  of  optimal  policies,  it  is  necessary  to  be  able  to  verify  that  C,  is 
continuous. 

In  many  dynamic  programming  problems,  including  those  alluded  to  above,  the  space  S  will 
have  a  product  representation  S  =  Y  x  Z.  Suppose  this  is  the  case  and  let  St  =  (Yt,  Zt)  represent  the 
time  t  state  variable  with  Yt  and  Zt  respectively  taking  values  in  Y  and  Z.  Suppose  additionally  that 
Yt  is  a  sufficient  statistic;  that  is,  conditional  upon  the  past  state  variable  action  pair  (st-i,  at-i)  and 
outcome  yt,  the  distribution  of  Zt  is  independent  of  O.  Then  all  that  is  required  for  the  distribution  of 
posterior  beliefs  to  be  continuous  is  that  the  marginal  distribution  ^y:0xSxA->  P(Y)  defined  by 
^y(0,  s,  a)(B)  =  ^(0,  s,  a)(B  x  Z)  be  continuous  when  P(Y)  is  endowed  with  the  total  variation 
topology.  Summarizing,  £  will  be  continuous  if  t,  is  continuous  and  the  marginal  map  ^y  is  total 
variation  continuous. 

There  still  remains  the  issue  of  determining  if  ^y  is  total  variation  continuous.  This  task  is 

simplified  if  there  exists  a  a-finite  measure  X  on  (Y,  £(Y))  such  that  for  all  (0,  s,  a)  e  O  x  S  x  A, 

5(6,  s,  a)  «  X.  Then  by  Lemma  4.6,  a  necessary  and  sufficient  condition  is  that  for 

ra     o     o  \     v  ta  \  *u  .   d^y(8n,  sn,  an)     v   d^y(6p,  sp,  ap)  , 

(on,  sn,  an)  ->  (6p,  sp,  ap)  that   ->  — in  v  measure. 

dv  dv 

In  particular,  suppose  Y  c  Rm,  X  is  Lebesgue  measure,  and  £,y(9,  s,  a)  can  be  represented  by  a 

density  f(y,  8,  s,  a).  Then  ^y  is  total  variation  continuous  if  and  only  (0n,  sn,  an)  ->  (0p,  sp,  ap) 

implies  that  f(-,0n,  sn,  an)  converges  in  Lebesgue  measure  to  f(-,0p,  sp,  ap). 
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APPENDIX 

Proof  of  LEMMA  4.1.  Select  f  e  BL(S,  d)  with  llfllBL  ^  1.  It  suffices  to  verify  that 

I  |f(s)  |i(ds)  -   jf(s)  rj(ds)l  <  e.  Define  an  =  inf  f(s)  and  bn  =  sup  f(s).  Since  IIAIbl  £  1,  it  follows 
S  S  S6An  seAn 

that:  (i)  sup  { lf(s)  -  f(t)l:  s,  t  e  An}  <  diam  An  <  8,  (ii)  bn  -  an  <  5,  and  (iii)  lanl,  lbnl  ^  1. 
Since, 

Jf(s)  Ti(ds)  >    Jan  T|(ds)  =  Ti(An)-an  >  [fi(An)  -  Sir1]*,,, 
An  An 

and 

ff(s)  [i(ds)  <  |KAn)-bn, 
An 
we  have 

Jf(s)  ji(ds)  -     Jf(s)  ri(ds)  <  ^(An)-bn  -  tn(An)  -  8-k-i]-an  <  (n(A„)  +  k-i)-5. 
An  An 

Similarly,     Jf(s)  T)(ds)  -     Jf(s)  |l(ds)  <  (n(An)  +  k"l)-5, 
An  An 

SO 

I    Jf(s)Ti(ds)-     Jf(s)  ji(ds)l<  (n(An)  +  Jri)-8, 

An  An 

and 

I    Jf(s)  Ti(ds)  -     Jf(s)  ji(ds)l  <  [\i(Bn)  +  1]-S  <  2-5. 
Bn  Bn 


remaining  task  is  to  bound  I    Jf(s)  T|(ds)  -    Jf(s)  (i(ds)l. 

~R„  ~Bn 


Hie  remaining  task  is  to  bound       Jf(s)  ri(ds)  -      |f(s)  u(ds)l.  Observe  that 

-Bn 
Tl(~Bn)  =  1  -  TKBn)  <  1  -  *i(Bn)  +  8  <  28. 
:e  lf(s)l  <  1, 
I 

-Bn  ~Bn  ~Bn 

Combining  these  results, 

I     ff(s)  Tl(ds)  -     jf(s)|l(ds)l 


I    Jf(s)  *n(ds)l  <  28,  I    Jf(s)  H(ds)l  <  5,  and  I     Jf(s)  *n(ds)  -       ff(s)  |i(ds)l  <  38. 
~R„  ~R„  ~R„  ~Bn 


<l     Jf(s)  Ti(ds)  -     Jf(s)  M-(ds)l  +  I    Jf(s)  T](ds)  -     Jf(s)  |i(ds)l  <  5-8 
-R,,  ~Bn  Bn  Bn 


Proof  of  LEMMA  4.2.  Pick  e  >  0,  let  xs  denote  the  total  variation  metric  on  P(S),  and  let  II-  IU  denote 

the  L°°(S,  B(S),  vo)  norm.  Select  M  >  0  such  that  llfjIL  <  M  for  all  j,  and  choose  k  such  that  for  n  > 

e 

k,  Xs(vn,  vo)  < and  x(*nn,  "Ho)  <  £/4-  Since 

8-M 
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flfn(s)  -  f(s)l  v0(ds)  <  2-  Asu£S)  |  ffn(s)  v0(ds)  -    ff(s)  v0(ds)l, 


and 

I  ff„(s)v0(ds)-    ff(s)  v0(ds)l 


<  I  fffi(s)  v0(ds)  -    ffn(s)  v„(ds)l  +  I  ff„(s)  vn(ds)  -    ff(s)  v0(ds)l 


<  2-M-XS(v0,  Vn)  +  |Tln(A)  -  T|o(A)l 
<2-M-xs(v0,  vn)+xs(T|n,rio) 
<e/2, 

we  conclude  that  Jlfn(s)  -  f(s)l  vo(ds)  <  e.  Since  e  is  arbitrary,  fn  — >  f  in  L*(S,  £(S),  vo)  and  hence 

in  vo  measure.  ■ 

Proof  of  LEMMA  4.5.  To  prove  sufficiency,  suppose  fn  — >  f  in  v-measure  but  xs(An,  A.)  -h  0.  Then 
3  e  >  0  and  a  subsequence  {fnk}  with  supAeF  lAnk(A)  -  A,(A)I  >  e.  But  since  fnk  — >  f  in  v  measure,  3 
a  further  subsequence  fnk.  — >  f  v  a.e.  [see  e.g.,  Theorem  2.5.3  of  Ash  (1972)].  From  the  standard 
version  of  Scheffe's  Theorem  (as  in  Billingsley  (1986)),  supAe/7  'Ank(A)  -  A(A)I  — >  0,  contradicting 
supAeF  lA.nk(A) "  A(A)I  >  e  for  all  k. 

To  prove  necessity,  suppose  xs(An,  A)  — >  0.  By  Theorem  1.1  of  Devroye  and  Gyorfi  (1985), 

2-supAeF  l^n(A)  -  A(A)I  =  Jlfn(s)  -  f(s)l  v(ds);  and  by  definition,  xs(An,  X)  =  supAeF  'A-n(A)  - 
A.(A)I.  So  fn  — >  f  in  L^S,  F,  v)  and  hence  in  v  measure.  ■ 

Proof  of  LEMMA  4.6.  We  first  establish  that  if  fn  -»  f  v-a.e.,  then  Jfn(s)  vn(ds)  ->  Jf(s)  v(ds). 

Choose  K  such  that  lfn(s)l  <  K  for  all  s.  Define  E  =  {s:  fn(s)  -h  f(s)}.  Define  gn,  g:  S  ->  R  by  gn(s) 
=  fn(s)  for  s  £  E  and  g(s)  =  f(s)  for  s  €  E,  while  gn(s)  =  g(s)  =  0  for  s  e  E.  Since  gn(s)  ->  g(s)  for 
all  s  e  S,  by  Proposition  11.18  of  Royden  (1988),  Jgn(s)  vn(ds)  ->  fg(s)  v(ds).  Since  v({s:  g(s)  ± 

f(s)})  =  0,   ]g(s)v(ds)=  Jf(s)  v(ds).  Furthermore,  since    flgn(s)  -  fn(s)l  vn(ds)  =  0, 

and    f lgn(s)  -  f„(s)l  Vn(ds)  <  2K-vn(~E), 

I  Jgn(s)  vn(ds)  -    ffn(s)  vn(ds)l  <  0  +  2K-vn(~E)  -»  0. 

So, 

I   ff(s)  V(ds)  -    ffn(s)  Vn(ds)l 

<  I  fg(s)  v(ds)  -  fgn(s)  vn(ds)l  +  I  Jgn(s)  vn(ds)  -  ffn(s)  v„(ds)l  ->  0, 
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verifying  that  Jfn(s)  vn(ds)  — >  Jf(s)  v(ds)  when  fn  — >  f  v  almost  everywhere. 

Extending  the  proof  to  the  case  where  fn  -»  f  in  v  measure,  now  follows  the  technique  used  in 
the  proof  of  Lemma  6.  Suppose  that  for  some  e  >  0,  3  a  subsequence  { fnk}  with 

ijfnfc(s)  vnk(ds)  -  Jf(s)  v(ds)l  >  e.  But  then  3  a  subsubsequence  fnk.  -»  f  v  a.e..  But  by  the  above 
paragraph,  I  |fnk.(s)  vnk.(ds)  -   Jf(s)  v(ds)l  ->  0,  contradicting  ljfnk(s)  vnk(ds)  -   Jf(s)  v(ds)l 


>  e. 


Proof  of  PROPOSITION  4.7:  Let  A  be  a  |io-continuity  set,  jin  =*  W),  and  xn  ->  xo  e  X.  We  must 
verify  that  for  B  e  B(Y),  $a(Mu»  xn)(B)  ->  <|>a(H0»  xq)(B)  uniformly  in  B.  If  Ho(A)  =  0,  then 
(j)A(M-n»  xn)(B)  <  fin(A)  ->  0  =  <|)a(W),  xo).  So  we  may  restrict  attention  to  |io(A)  >  0. 

|In(A  n  B) 


For  fio(A)  >  0  and  n  sufficiently  large,  define  yn  and  yo  on  (0,  B(Q))  by  yn(B)  =- 


Hn(A) 


and  yo(B)  = .  By  Lemma  4,  yn  =>  yo.  Let  5n,  5o  e  P(X)  denote  respectively  the  Dirac 

Mo(A) 

measures  with  mass  at  xn  and  xo.  Define  the  probability  measures  T|n,  r|o  on  (0  x  X,  B(Q)  x  B(X)) 

by  T|n(B  x  C)  =  MAn  B).yn(C)  and  Tlo(B  xC)  =  ^°(A  -  B)-yo(C).  By  Theorem  4.4  of 

IVA)  fio(A) 

Billingsley  [1968],  r\n  =*  t\q. 

For  B  e  £(Y),  define  ^b:  0  x  X  ->  R  by  *FB(6,  x)  =  *F(0,  x)(B),  and  note  that 

J^B(e,x)T]n(d(e,  x))  =  <j>A(Hn,  x„)(B)  and    J^B(e,x)r|o(d(0,  x))  =  <j)A(ji0,  xo)(B).  A  direct 
0  0 

consequence  of  Assumption  2  is  the  equicontinuity  of  the  family  of  functions  {^b-  B  e  B(Y)}. 

Since  {^b:  B  e  B(Y)}  are  equicontinuous  and  T|n  =>  T|o,  by  Exercise  8,  p.  17  of  Billingsley  [1968], 

4>A(Hn,  xn)(B)  =  JVB(0,  x)  r|n(d(9,  x))  -»     J^B(6,  x)  TKd(9,  x))  =  <j>A(|io,  x0)(B)  uniformly  in 

0  0 

B.  ■ 
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